Methods for prediction and rating aggregation

ABSTRACT

A method determines forecaster biases based on the correlations between prediction errors made by a group of forecasters. The method measures forecaster skills in different areas of expertise based on the correlations between the absolute values of prediction errors made by a group of forecasters. The method uses bias and skill measurements to raise the precision of aggregate forecasts. The method also reduces bias in social network rating systems by linking the ratings with objectively verifiable predictions.

CROSS-REFERENCE TO RELATED APPLICATIONS

Provisional application number: 63/069,304

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

Not Applicable

THE NAMES OF THE PARTIES TO A JOINT RESEARCH AGREEMENT

Not applicable

INCORPORATION-BY-REFERENCE OF MATERIAL SUBMITTED ON A COMPACT DISC

Not applicable

FIELD OF INVENTION

The present disclosure relates to a process of deriving maximally accurate data based on information from multiple sources. More specifically, it relates to a process of aggregating predictions and ratings.

BACKGROUND OF THE INVENTION

When multiple forecasters attempt to predict an unknown parameter, the standard technique for deriving an aggregate prediction is to calculate a weighted average of individual predictions, with higher weights given to predictions made by forecasters known for higher accuracy. As long as individual errors are random, the error of the aggregate prediction tends to decrease as the number of forecasters increases. This technique becomes inadequate when a significant fraction of forecasters have a systematic bias (i.e. the tendency for forecast error to be persistent in one direction).

One method of addressing this problem is to ask forecasters to predict the same parameter for different target dates. For example, economic analysts may be repeatedly asked to predict the next month's value of a stock market index in order to determine whether they are systematically biased towards overly optimistic or overly pessimistic predictions. After the analysts' bias has been measured, an appropriate correction can be made in the future aggregate forecasts. By its nature, this approach is unsuitable for questions that cannot be asked more than once. In addition, it can fail on questions involving more than one psychological bias. For example, overly optimistic economic analysts can transition to making overly pessimistic predictions after the political party they oppose wins elections.

An alternative approach is to try reducing forecaster biases, either through personal training or by providing financial incentives for accuracy. Both methods typically achieve limited results even with significant financial investment.

Neither of the existing techniques is applicable to questions that do not have a clear resolution method. For example, evaluating predictions on the effectiveness of competing policy proposals is generally impossible when at most one of the proposals ends up being implemented. Similarly, the existing techniques are not applicable for debiasing subjective estimates.

BRIEF SUMMARY OF THE INVENTION

One aspect of the present invention addresses these and other problems by providing a computer-implemented method based on the analysis of correlations between errors made by different forecasters. The method uses these correlations to calculate a set of parameters for each forecaster describing the strength of the forecaster's biases. Simultaneously, a set of parameters is derived for each question that describes the sensitivity of prediction values to the change in bias values. With forecaster biases and prediction sensitivity to bias being known, it becomes possible to estimate bias-driven errors on any question and make appropriate corrections to aggregate predictions.

Another aspect of the invention is directed to a method for measuring forecaster competence in different areas of expertise based on the analysis of correlations between the absolute magnitudes of errors made by different forecasters. A set of parameters is further derived that describes sensitivity of prediction errors to forecaster competence. With forecaster competence and prediction sensitivity to competence being known, it becomes possible to estimate the magnitude of random errors on any question and make appropriate corrections to aggregate predictions.

Another aspect of the invention is directed to a method for using objectively verifiable predictions to reduce bias effects in the subjective rating systems used by social networks. The method determines user biases by asking them to make predictions of unknown parameters, and then comparing the predictions to the correct values of the parameters. Based on the correlation between user biases and the rating values, it becomes possible to make appropriate corrections to aggregate ratings.

Additional aspects, applications and advantages will become apparent in view of the following description and associated figures.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating the method for evaluating forecaster bias.

FIG. 2 is a diagram illustrating the method for evaluating bias in the absence of verified predictions.

FIG. 3 is a diagram illustrating the method for joint processing of predictions and social network ratings.

DETAILED DESCRIPTION OF THE INVENTION

Although the following detailed description contains many specifics for the purposes of illustration, anyone of ordinary skill in the art will appreciate that many variations and alterations to the following details are within the scope of the invention. Accordingly, the following embodiments of the invention are set forth without any loss of generality to, and without imposing limitations upon, the claimed invention.

Common biases of forecasters can be detected and measured by analysing their prediction record, as described in FIG. 1. Comparing predictions to the actual outcomes allows evaluating an error for each prediction. The resulting errors can be decomposed into two parts:

(a) random errors relating to idiosyncratic methods and thought processes of each individual forecaster (the effect of random errors can be mostly eliminated by averaging predictions from a large number of forecasters); (b) systematic bias-driven errors made by multiple forecasters. For each question-forecaster pair, the magnitude of the systematic bias-driven error depends on two sets of parameters: (a) strength and direction of the forecaster's biases; (b) sensitivity of predictions to forecaster biases for the selected question. Both sets can be represented as vectors: (a) a bias vector, {right arrow over (B)}, which measures the strength and direction of forecaster biases; (b) a controversy vector, {right arrow over (C)}, which for a selected question measures the sensitivity of the predictions to forecaster biases. Since forecaster biases tend to remain consistent over long periods of time, the errors of forecasters with similar biases are correlated across different questions. Based on these correlations, it is possible to measure the values of both vector sets. In one embodiment, this may be implemented by minimizing a cost function, S₁, with respect to {right arrow over (C)}_(i) and {right arrow over (B)}_(j),

$\begin{matrix} {{S_{1} = {\sum\limits_{i,j}{{X_{i} - \left( {x_{ij} - {{\overset{\rightarrow}{C}}_{i} \cdot {\overset{\rightarrow}{B}}_{j}}} \right)}}^{2}}},} & (1) \end{matrix}$

where x_(ij) is the prediction for some unknown parameter i made by forecaster j, λ_(i) is the true value of the parameter i, vector {right arrow over (B)}_(j) describes forecaster j biases and vector {right arrow over (C)}_(i) describes predictions' sensitivity to bias for parameter i. The different coordinates of the vectors found by minimization of the cost function correspond to different types of biases that are common among the forecasters. For example, on questions related to politics, main bias axes may correspond to ideological divisions among forecasters (left vs right, libertarian vs authoritarian etc.). After forecasters' bias vectors {right arrow over (B)}_(j) have been found, a debiased aggregate prediction, λ_(agg), for any new question whose answer is not yet known can be calculated by minimizing another cost function, S₂, with respect to {right arrow over (C)} and λ_(agg),

$\begin{matrix} {{S_{2} = {\sum\limits_{j}{{X_{agg} - \left( {x_{j} - {\overset{\rightarrow}{C} \cdot {\overset{\rightarrow}{B}}_{j}}} \right)}}^{2}}},} & (2) \end{matrix}$

where x_(j) is the prediction made by forecaster j and {right arrow over (C)} is the predictions' sensitivity to bias for the selected question.

Another feature of the invention is directed to a method for estimating bias of forecasters whose predictions have not yet been verified. Forecasters and questions are divided into separate sets, as shown in FIG. 2. Set G1 includes forecasters who have made predictions on questions whose answers are already known (set Q1) and whose bias vectors can be calculated directly. Set Q2 includes questions whose answers are not yet known, but for which there exist predictions from forecasters with known biases and for which debiased aggregate predictions can be calculated. Set G2 includes forecasters who have not yet made predictions on questions with known answers (set Q1), but who made predictions on questions with known debiased aggregate predictions (set Q2). For the set G2, it is possible to estimate forecaster biases by comparing individual predictions to the debiased aggregate predictions and treating the differences as proxies for actual forecasting errors. With forecaster biases in the set G2 being thus measured, it becomes possible to debias aggregate predictions for the set of questions Q3 where predictions have been made only by forecasters in the set G2. The debiased aggregate predictions for the set Q3 can be added to the debiased aggregate predictions for the set Q2 to improve the accuracy of bias calculation for the forecasters in the set G2. Similarly, the debiased aggregate predictions for the set Q2 can be combined with the actual outcomes for the questions in the set Q1 to improve the accuracy of bias calculation for the forecasters in the set G1.

In one embodiment, the entire procedure may be implemented by minimizing a single cost function, S₃, with respect to λ_(agg,i), {right arrow over (C)}_(i) and {right arrow over (B)}_(j),

$\begin{matrix} {{S_{3} = {{\sum\limits_{{i \in Q},j}{{X_{{agg},i} - \left( {x_{ij} - {{\overset{\rightarrow}{C}}_{i} \cdot {\overset{\rightarrow}{B}}_{j}}} \right)}}^{2}} + {\lambda_{Q}{\sum\limits_{i \in {Q1}}{{X_{{agg},i} - x_{i}}}^{2}}}}},} & (3) \end{matrix}$

where Q is a set of all questions and λ_(Q) is a constant.

Another feature of the invention is directed to a method for scoring forecaster accuracy. Forecasting questions can be divided into two categories:

I. Questions where objectively verifiable answers will become available at some future point (e.g., “How many votes will be won by political party X in the next elections?”). II. Questions that do not have an accepted resolution method. For example, such questions may involve hypothetical scenarios (e.g., “What would have been the inflation rate had budget proposal X been accepted?”) or assessment of controversial statements (e.g., “What is the probability that claim X is correct?”). For the questions in the second category, forecaster accuracy cannot be directly evaluated by comparing predictions to the actual outcomes. In settings where forecasters compete against each other or where they attempt to meet a set accuracy target, forecasters' motivation to invest a significant effort into obtaining the correct answer is likely to decline, resulting in larger prediction errors. This problem can be mitigated by using a scoring system that evaluates individual predictions according to their proximity to the debiased aggregate predictions in cases where objectively verifiable answers are not available.

Another feature of the invention is directed to a method for reducing bias effects in rating systems used by social networks. The factors that determine ratings of items on social networks may be divided into three categories:

a) quality of the items; b) subjective preferences of users; c) objective biases of users. For example, ratings of comments and articles on social media are based in part on the objective quality characteristics (e.g., factual accuracy), in part on the subjective preferences of users (e.g., the level of personal interest in the topic under discussion) and in part on the user biases (e.g., the tendency to perceive postings that match users' opinions as more accurate). In order to distinguish subjective preferences from objective biases, and reduce the effect of the latter on the aggregate ratings of items, the ratings can be linked with objectively verifiable predictions, as shown in FIG. 3. In addition to providing the ratings, a fraction of social network users (forecasters) are requested to make predictions of objectively verifiable parameters. Forecasters' biases are then evaluated by comparing their predictions to the ground truth data. After the forecasters' biases become known, the ratings provided by the forecasters are corrected for bias based on the observed correlations between ratings and biases. Based on the debiased ratings, aggregate ratings are calculated for any items rated by the forecasters. By comparing the ratings provided by regular social network users to the debiased aggregate ratings and treating the differences as proxies for actual prediction errors, biases of the regular users can also be estimated. With the regular users' bias being known, all ratings the users provide can be corrected for bias and debiased aggregate ratings can be calculated for any items they rate, including those that were not directly rated by the forecasters. In one embodiment, the calculation may be performed by minimizing a cost function, S₄, with respect to λ_(agg,i), r_(agg,i), {right arrow over (M)}_(i), {right arrow over (V)}_(j), {right arrow over (C)}_(i) and {right arrow over (B)}_(j),

$\begin{matrix} {{S_{4} = {{\sum\limits_{{i \in Q},j}{{X_{{agg},i} - \left( {x_{ij} - {{\overset{\rightarrow}{C}}_{i} \cdot {\overset{\rightarrow}{B}}_{j}}} \right)}}^{2}} + {\lambda_{r}{\sum\limits_{{i \in R},j}{{r_{{agg},i} - \left( {r_{ij} - {{\overset{\rightarrow}{C}}_{i} \cdot {\overset{\rightarrow}{B}}_{j}} - {{\overset{\rightarrow}{M}}_{i} \cdot {\overset{\rightarrow}{V}}_{j}}} \right)}}^{2}}} + {\lambda_{Q}{\sum\limits_{i \in {Q1}}{{X_{{agg},i} - x_{i}}}^{2}}}}},} & (4) \end{matrix}$

where r_(ij) is the rating for item i by user j, R is the set of all rated items, {right arrow over (V)}_(j) is the vector describing subjective preferences of user j, {right arrow over (M)}_(i) is the vector describing the sensitivity of ratings to subjective preferences for item i, r_(agg,i) is the debiased aggregate rating for item i and λ_(r) is a constant.

The linking between verifiable predictions and ratings can be used for constructing collaborative recommendation systems, where items would be scored and ranked based on their debiased aggregate rating, r_(agg,i), and correspondence to personal preferences of users, {right arrow over (M)}_(i)·{right arrow over (V)}_(j) (the simplest approach may be to rank items based on the sum, r_(agg,i)+{right arrow over (M)}_(i)·{right arrow over (V)}_(j), which equals the expected rating of item i by user j corrected for the user's bias). Alternatively, items may be scored and tagged according to the sensitivity of their ratings to bias, {right arrow over (C)}_(i). In the latter case, high absolute values of {right arrow over (C)}_(i) can signal to users potentially problematic items (e.g., media content that contains factual disinformation).

Another feature of the invention is directed to a method for measuring forecaster competence in different fields of expertise. Since the individual performance typically varies across different areas (for example, some forecasters may be consistently better in economic forecasting, others in predicting electoral outcomes, etc.), the optimal results may be obtained by defining forecaster's competence not as a scalar, as is commonly done, but as a vector whose coordinates correspond to competence levels in unrelated areas. The values of the vector can be determined by measuring correlations between the absolute values of prediction errors. In one embodiment this may be implemented by minimizing the cost function, S₅, with respect to {right arrow over (k)}_(i) and {right arrow over (w)}_(j),

$\begin{matrix} {{S_{5} = \frac{\sum_{i,j}{\left( {{\overset{\rightarrow}{w}}_{j} \cdot {\overset{\rightarrow}{k}}_{i}} \right){{X_{i} - x_{ij}}}^{2}}}{\sum_{i,j}\left( {{\overset{\rightarrow}{w}}_{j} \cdot {\overset{\rightarrow}{k}}_{i}} \right)}},} & (5) \end{matrix}$

where {right arrow over (w)}_(j) measures forecaster j competence and {right arrow over (k)}_(i) measures the sensitivity of prediction errors for question i to forecaster competence.

In an alternative embodiment the method for measuring forecaster competence can be combined with previously described methods of bias mitigation and rating processing. This can be implemented by minimizing the cost function, S₆,

$\begin{matrix} {{s_{6} = {\frac{\sum_{{i \in Q},j}{\left( {{\overset{\rightarrow}{w}}_{j} \cdot {\overset{\rightarrow}{k}}_{i}} \right){{X_{{agg},i} - \left( {x_{ij} - {{\overset{\rightarrow}{C}}_{i} \cdot {\overset{\rightarrow}{B}}_{j}}} \right)}}^{2}}}{\sum_{{i \in Q},j}\left( {{\overset{\rightarrow}{w}}_{j} \cdot {\overset{\rightarrow}{k}}_{i}} \right)} + {{\frac{\lambda_{r}{\sum_{{i \in R},j}{\left( {{\overset{\rightarrow}{w}}_{j} \cdot {\overset{\rightarrow}{k}}_{i}} \right){{{r_{{agg},i} - \left( {r_{ij} - {{\overset{\rightarrow}{C}}_{i} \cdot {\overset{\rightarrow}{B}}_{j}} - {{\overset{\rightarrow}{M}}_{i} \cdot {\overset{\rightarrow}{V}}_{j}}} \right)}❘^{2}}}}}}{\sum_{{i \in R},j}\left( {{\overset{\rightarrow}{w}}_{j} \cdot {\overset{\rightarrow}{k}}_{i}} \right)}++}\lambda_{Q}{\sum\limits_{i \in {Q1}}{{X_{{agg},i} - x_{i}}}^{2}}}}},} & (6) \end{matrix}$

with respect to λ_(agg,i), r_(agg,i), {right arrow over (k)}_(i), {right arrow over (M)}_(i), {right arrow over (V)}_(j), {right arrow over (C)}_(i) and {right arrow over (B)}_(f).

Another feature of the invention is directed to a method for using supplementary data to improve the accuracy of the calculations. In some cases, there may exist additional information about forecasters or social network users that is correlated with their competence, bias or personal preferences. Common examples of such information include:

(a) personal characteristics (for example, the education level may be highly predictive for person's competence in some areas of expertise); (b) ratings of user made items submitted by other users (for example, comments of ideologically biased users are likely to be more highly rated by users with the same bias). Similarly, there may exist additional information about prediction questions and rated items that is correlated with their aggregate ratings or their sensitivity to competence, bias or personal preferences. Common examples of such information include: (a) individual characteristics of questions or items (for example, question's topic category may be highly predictive for sensitivity to competence in some areas of expertise); (b) data on users who produced the items (for example, aggregate ratings for items previously produced by the users may be highly predictive for aggregate ratings of their future items). This additional information can be combined with prediction and rating data to improve the accuracy of the calculations. In one embodiment this may be implemented by adding a regularization term to a cost function,

$\begin{matrix} {{{\Delta S} = {{\lambda_{C}{\sum\limits_{i}\left( {{{\overset{\rightarrow}{C}}_{i} -} < \overset{\rightarrow}{C} >} \right)^{2}}} + {\lambda_{M}{\sum\limits_{i}\left( {{{\overset{\rightarrow}{M}}_{i} -} < \overset{\rightarrow}{M} >} \right)^{2}}} + {\lambda_{k}{\sum\limits_{i}{{\left( {{{\overset{\rightarrow}{k}}_{i} -} < \overset{\rightarrow}{k} >} \right)^{2}++}\lambda_{B}{\sum\limits_{j}\left( {{{\overset{\rightarrow}{B}}_{j} -} < \overset{\rightarrow}{B} >} \right)^{2}}}}} + {\lambda_{V}{\sum\limits_{j}\left( {{{\overset{\rightarrow}{V}}_{j} -} < \overset{\rightarrow}{V} >} \right)^{2}}} + {\lambda_{w}{\sum\limits_{j}\left( {{{\overset{\rightarrow}{w}}_{j} -} < \overset{\rightarrow}{w} >} \right)^{2}}}}},} & (7) \end{matrix}$

where <{right arrow over (C)}>, <{right arrow over (M)}>, <{circumflex over (k)}>, <{right arrow over (B)}>, <{right arrow over (V)}> and <{right arrow over (w)}> are the average values of vectors {right arrow over (C)}, {right arrow over (M)}, {right arrow over (k)}, {right arrow over (B)}, {right arrow over (V)} and {right arrow over (w)}, respectively, and λ_(C), λ_(M), λ_(k), λ_(B), λ_(V) and λ_(w) are constants. When additional information is available that allows forecasters and users who provide the ratings to be split into separate groups (for example, by their education levels), the calculation accuracy may be improved by replacing the global averages, <{right arrow over (B)}>, <{right arrow over (V)}> and <{right arrow over (w)}>, with the average values for each group. Similarly, when questions and rated items can be split into different groups (for example, by topic), the calculation accuracy may be improved by replacing the global averages, <{right arrow over (C)}>, <{right arrow over (k)}> and <{right arrow over (M)}>, with the average values for each group.

The optimal number of vector dimensions in the above formulas and the optimal values for constants λ_(C), λ_(Q), λ_(r), λ_(M), λ_(k), λ_(B), λ_(V) and λ_(w) can be determined by using the k-fold cross-validation method. The ground truth data for the Q1 question set is randomly divided into k equal size sets. One by one, each set is selected as a test set, to evaluate the accuracy of aggregate predictions for the set against the actual outcomes. Then the constants and the number of dimensions that result in highest average accuracy are selected.

The accuracy of the above methods may be improved by mapping predictions on different questions, x_(ij), into a single normal distribution prior to any further processing. The mapping can be performed using the following procedure: (a) for each parameter i rank all the values of x_(ij) from the largest to the smallest; (b) if a certain value is encountered multiple times use an average rank for each value (for example, if the value 0.5 is encountered 3 times and there 20 values larger than 0.5, use the average rank, 22=(21+22+23)/3); (c) apply the transformation x_(ij,new)=√{square root over (2)} erf⁻¹(2n_(ij)/n_(total,i)−1), where erf⁻¹ is the inverse error function, n_(ij) is the prediction x_(ij) rank and n_(total,i) is the total number of predictions made for parameter i. The values of aggregate predictions, λ_(agg,i), calculated in the new units can be mapped back into original units by finding the nearest x_(ij,new) values and applying linear interpolation.

All the above methods are not restricted to individual forecasters, but can be applied to any information source, including forecasting teams, public organizations, machines or any group comprising people and machines. Likewise, the methods are not restricted to forecasting information related to the future events and processes, but can also be applied to estimating any parameters whose values are presently unknown or controversial.

It will be clear to one skilled in the art that the above embodiments may be altered in many ways without departing from the scope of the invention. Accordingly, the scope of the invention should be determined by the following claims and their legal equivalents. 

What is claimed is:
 1. A computer implemented method, comprising: obtaining individual estimates for a set of numerical parameters from a plurality of information sources, wherein an information source may be a person, a group, an organization or any other entity that can provide information; comparing at least some of the estimates to the correct values of at least some of the parameters; evaluating biases of the information sources based on correlations between estimate errors of the information sources.
 2. The method of claim 1, further including: measuring sensitivity of the estimates to the biases of the information sources.
 3. The method of claim 2, further including: calculating aggregate estimates for the parameters based on the individual estimates, the biases of the information sources and the sensitivity of the estimates to the biases of the information sources.
 4. The method of claim 3, further including: measuring differences between the individual and aggregate estimates for parameters whose correct values are unknown; adjusting the bias values based on the measured differences.
 5. The method of claim 3, further including: measuring differences between the individual and aggregate estimates for parameters whose correct values are unknown; using the measured differences to score accuracy of the information sources.
 6. The methods of claim 1, further including: obtaining ratings for a plurality of items from a plurality of persons; calculating, based on the ratings and bias values of persons who provided the ratings, aggregate ratings for the items.
 7. The method of claim 6, further including: measuring differences between the individual and aggregate ratings; adjusting the bias values of the persons who provided the ratings based on the measured differences.
 8. The methods of claim 1, further including: obtaining ratings for a plurality of items from a plurality of persons; scoring person-item pairs based on the ratings and bias values of the persons who provided the ratings.
 9. A computer implemented method, comprising: obtaining individual estimates for a set of numerical parameters from a plurality of information sources, wherein an information source may be a person, a group, an organization or any other entity that can provide information; comparing at least some of the estimates to the correct values of at least some of the parameters; evaluating competence of the information sources based on correlations between absolute values of estimate errors of the information sources.
 10. The method of claim 9, further including: measuring sensitivity of the estimates to the competence of the information sources.
 11. The method of claim 10, further including: calculating aggregate estimates for the parameters based on the individual estimates, the competence of the information sources and the sensitivity of the estimates to the competence of the information sources.
 12. The method of claim 11, further including: measuring differences between the individual and aggregate estimates for parameters whose correct values are unknown; adjusting the competence values based on the measured differences.
 13. The method of claim 11, further including: measuring differences between individual and aggregate estimates for parameters whose correct values are unknown; using the measured differences to score accuracy of the information sources.
 14. The methods of claim 9, further including: obtaining ratings for a plurality of items from a plurality of persons; calculating, based on the ratings and competence values of persons who provided the ratings, aggregate ratings for the items.
 15. The method of claim 14, further including: measuring differences between the individual and aggregate ratings; adjusting the competence values based on the measured differences.
 16. The methods of claim 9, further including: obtaining ratings for a plurality of items from a plurality of persons; scoring person-item pairs based on the ratings and competence values of the persons who provided the ratings.
 17. A computer implemented method, comprising: obtaining a plurality of item ratings from social network users; obtaining estimates for a set of parameters from at least some of the users; obtaining correct values of at least some of the parameters; assigning scores to the users based on the differences between the parameter estimates submitted by the users and the correct values of the parameters; assigning scores to the items based on their ratings and the scores of the users who provided the ratings.
 18. The method of claim 17, further including: adjusting the user scores based on at least one of (a) ratings of items made by the users, (b) the difference between item ratings submitted by the users and aggregate ratings of the same items, or (c) individual characteristics of the users that correlate with the user scores.
 19. The method of claim 17, further including: adjusting the item scores based on at least one of (a) scores of the users who produced the items, or (b) individual characteristics of the items that correlate with the item scores. 